Goto

Collaborating Authors

 polish space


The Many Faces of Adversarial Risk

Neural Information Processing Systems

Adversarial risk quantifies the performance of classifiers on adversarially perturbed data. Numerous definitions of adversarial risk--not all mathematically rigorous and differing subtly in the details--have appeared in the literature. In this paper, we revisit these definitions, make them rigorous, and critically examine their similarities and differences. Our technical tools derive from optimal transport, robust statistics, functional analysis, and game theory. Our contributions include the following: generalizing Strassen's theorem to the unbalanced optimal transport setting with applications to adversarial classification with unequal priors; showing an equivalence between adversarial robustness and robust hypothesis testing with -Wasserstein uncertainty sets; proving the existence of a pure Nash equilibrium in the two-player game between the adversary and the algorithm; and characterizing adversarial risk by the minimum Bayes error between a pair of distributions belonging to the -Wasserstein uncertainty sets. Our results generalize and deepen recently discovered connections between optimal transport and adversarial robustness and reveal new connections to Choquet capacities and game theory.


The Many Faces of Adversarial Risk

Neural Information Processing Systems

Adversarial risk quantifies the performance of classifiers on adversarially perturbed data. Numerous definitions of adversarial risk--not all mathematically rigorous and differing subtly in the details--have appeared in the literature. In this paper, we revisit these definitions, make them rigorous, and critically examine their similarities and differences. Our technical tools derive from optimal transport, robust statistics, functional analysis, and game theory. Our contributions include the following: generalizing Strassen's theorem to the unbalanced optimal transport setting with applications to adversarial classification with unequal priors; showing an equivalence between adversarial robustness and robust hypothesis testing with -Wasserstein uncertainty sets; proving the existence of a pure Nash equilibrium in the two-player game between the adversary and the algorithm; and characterizing adversarial risk by the minimum Bayes error between a pair of distributions belonging to the -Wasserstein uncertainty sets. Our results generalize and deepen recently discovered connections between optimal transport and adversarial robustness and reveal new connections to Choquet capacities and game theory.



A Unified Kantorovich Duality for Multimarginal Optimal Transport

arXiv.org Machine Learning

Multimarginal optimal transport (MOT) has gained increasing attention in recent years, notably due to its relevance in machine learning and statistics, where one seeks to jointly compare and align multiple probability distributions. This paper presents a unified and complete Kantorovich duality theory for MOT problem on general Polish product spaces with bounded continuous cost function. For marginal compact spaces, the duality identity is derived through a convex-analytic reformulation, that identifies the dual problem as a Fenchel-Rockafellar conjugate. We obtain dual attainment and show that optimal potentials may always be chosen in the class of $c$-conjugate families, thereby extending classical two-marginal conjugacy principle into a genuinely multimarginal setting. In non-compact setting, where direct compactness arguments are unavailable, we recover duality via a truncation-tightness procedure based on weak compactness of multimarginal transference plans and boundedness of the cost. We prove that the dual value is preserved under restriction to compact subsets and that admissible dual families can be regularized into uniformly bounded $c$-conjugate potentials. The argument relies on a refined use of $c$-splitting sets and their equivalence with multimarginal $c$-cyclical monotonicity. We then obtain dual attainment and exact primal-dual equality for MOT on arbitrary Polish spaces, together with a canonical representation of optimal dual potentials by $c$-conjugacy. These results provide a structural foundation for further developments in probabilistic and statistical analysis of MOT, including stability, differentiability, and asymptotic theory under marginal perturbations.


A Foundational Theory of Quantitative Abstraction: Adjunctions, Duality, and Logic for Probabilistic Systems

arXiv.org Artificial Intelligence

The analysis and control of stochastic dynamical systems rely on probabilistic models such as (continuous-space) Markov decision processes, but large or continuous state spaces make exact analysis intractable and call for principled quantitative abstraction. This work develops a unified theory of such abstraction by integrating category theory, coalgebra, quantitative logic, and optimal transport, centred on a canonical $\varepsilon$-quotient of the behavioral pseudo-metric with a universal property: among all abstractions that collapse behavioral differences below $\varepsilon$, it is the most detailed, and every other abstraction achieving the same discounted value-loss guarantee factors uniquely through it. Categorically, a quotient functor $Q_\varepsilon$ from a category of probabilistic systems to a category of metric specifications admits, via the Special Adjoint Functor Theorem, a right adjoint $R_\varepsilon$, yielding an adjunction $Q_\varepsilon \dashv R_\varepsilon$ that formalizes a duality between abstraction and realization; logically, a quantitative modal $μ$-calculus with separate reward and transition modalities is shown, for a broad class of systems, to be expressively complete for the behavioral pseudo-metric, with a countable fully abstract fragment suitable for computation. The theory is developed coalgebraically over Polish spaces and the Giry monad and validated on finite-state models using optimal-transport solvers, with experiments corroborating the predicted contraction properties and structural stability and aligning with the theoretical value-loss bounds, thereby providing a rigorous foundation for quantitative state abstraction and representation learning in probabilistic domains.


A theoretical basis for model collapse in recursive training

arXiv.org Artificial Intelligence

Our analysis will draw heavily upon the three topics in probability theory mentioned above. We briefly summarize the relevant results here. These can be found respectively, in [3] (see also [12] for a more extensive treatment), [11], and [2] (see also [9] for a more extensive treatment), respectively. A. Convergence of probability measures: Let S be a Polish space, i.e., a separable topological space with its topology compatible with a complete metric. Let B denote its Borel σ -field, i.e., the smallest σ -field containing its open sets.



Families of Optimal Transport Kernels for Cell Complexes

arXiv.org Machine Learning

Recent advances have discussed cell complexes as ideal learning representations. However, there is a lack of available machine learning methods suitable for learning on CW complexes. In this paper, we derive an explicit expression for the Wasserstein distance between cell complex signal distributions in terms of a Hodge-Laplacian matrix. This leads to a structurally meaningful measure to compare CW complexes and define the optimal transportation map. In order to simultaneously include both feature and structure information, we extend the Fused Gromov-Wasserstein distance to CW complexes. Finally, we introduce novel kernels over the space of probability measures on CW complexes based on the dual formulation of optimal transport.


Leveraging Optimal Transport for Distributed Two-Sample Testing: An Integrated Transportation Distance-based Framework

arXiv.org Machine Learning

This paper introduces a novel framework for distributed two-sample testing using the Integrated Transportation Distance (ITD), an extension of the Optimal Transport distance. The approach addresses the challenges of detecting distributional changes in decentralized learning or federated learning environments, where data privacy and heterogeneity are significant concerns. We provide theoretical foundations for the ITD, including convergence properties and asymptotic behavior. A permutation test procedure is proposed for practical implementation in distributed settings, allowing for efficient computation while preserving data privacy. The framework's performance is demonstrated through theoretical power analysis and extensive simulations, showing robust Type I error control and high power across various distributions and dimensions. The results indicate that ITD effectively aggregates information across distributed clients, detecting subtle distributional shifts that might be missed when examining individual clients. This work contributes to the growing field of distributed statistical inference, offering a powerful tool for two-sample testing in modern, decentralized data environments.


On the Consistency of Kernel Methods with Dependent Observations

arXiv.org Machine Learning

The consistency of a learning method is usually established under the assumption that the observations are a realization of an independent and identically distributed (i.i.d.) or mixing process. Yet, kernel methods such as support vector machines (SVMs), Gaussian processes, or conditional kernel mean embeddings (CKMEs) all give excellent performance under sampling schemes that are obviously non-i.i.d., such as when data comes from a dynamical system. We propose the new notion of empirical weak convergence (EWC) as a general assumption explaining such phenomena for kernel methods. It assumes the existence of a random asymptotic data distribution and is a strict weakening of previous assumptions in the field. Our main results then establish consistency of SVMs, kernel mean embeddings, and general Hilbert-space valued empirical expectations with EWC data. Our analysis holds for both finite- and infinite-dimensional outputs, as we extend classical results of statistical learning to the latter case. In particular, it is also applicable to CKMEs. Overall, our results open new classes of processes to statistical learning and can serve as a foundation for a theory of learning beyond i.i.d. and mixing.